Method and apparatus for tracking a moving object

ABSTRACT

A method of tracking an object moving relative to a scene includes generating a background image by acquiring a plurality of images of the scene and comparing a pair of comparison images with the background image. Potential objects are matched by comparing spatial characteristics of potential objects of a following image of the pair of comparison images with spatial characteristics of potential objects of a preceding image of the pair of comparison images to match potential objects in the following image to potential objects in the preceding image and each matched potential object is treated as a moving object to thereby track the movement of each moving object relative to said scene.

FIELD OF THE INVENTION

The present invention relates to a method and apparatus for tracking amoving object.

BACKGROUND OF THE INVENTION

There is an extensive literature concerned with the analysis of changingimages and with the characterization of motion within images. Much ofthis work is based on the concept of optical flow. This allows thecontinuity equation of fluid dynamics to be applied directly to imageprocessing problems. Thus much of the extensive corpus of knowledgedealing with the manipulation of the partial differential equations usedin fluid dynamics can be brought to bear on image processing problems.

This approach has some serious shortcomings. The underlying assumptionsthat the domain of investigation, in this case the image intensityfunction, is everywhere continuous and differentiable is rarely true inpractice. The image intensity function is usually a function on adiscrete domain of subscript values not on a continuum and rapid changesin intensity called “edges” are common in most images of interest. Suchrapid changes in intensity preclude the use of Taylor's theorem inestablishing the underlying equations of the optical flow method. Thepractical consequence of this is that optical flow methods have notproven to be particularly successful in the processing of images. Thisis hardly surprising when one considers that the very features thatfacilitate the alignment of images at an intuitive level, that is, sharpedges, have been sacrificed at the outset.

Another approach is to divide the image domain into zones and detect thepresence or absence of intensity differences within each zone. Thismethod is limited by the coarseness of this subdivision, which decreasesthe spatial resolution, and by its inability to independently track twodifferent objects when they pass one another.

Further, changes in pixel intensity can also be brought about by causesother than objects moving in the scene. In particular changes in sceneillumination will cause significant changes in image pixel intensity.These intensity changes are then passed to downstream algorithms andgive rise to spurious “moving objects” or false alarms. To some extentbrightness compensating cameras ameliorate the problem but they cannotcompensate for brightness changes that vary across the image.

SUMMARY OF THE INVENTION

The invention provides a method of tracking an object moving relative toa scene, the method including:

(a) generating a background image by acquiring a plurality of images ofthe scene, dividing each image into a plurality of background imageelements, determining a value of a characteristic of each saidbackground image element for each image to obtain a set of values, anddetermining from said set of values a background value for each of saidbackground image elements;(b) comparing a pair of comparison images with said background image by:

(i) acquiring a pair of comparison images, dividing each said comparisonimage into a plurality of comparison image elements corresponding torespective ones of the background image elements, and determining acomparison value of said characteristic for each of said comparisonimage elements;

(ii) comparing each said comparison value with said background value ofsaid corresponding background image element to determine whether saidcomparison value is sufficiently similar to said background value tothereby determine whether or not each said comparison image element issufficiently similar to the corresponding background image element; and

(iii) determining from which comparison image elements are notsufficiently similar to the corresponding background image element whichcomparison image elements relate to potential objects and determining aspatial characteristic of each said potential object;

(c) matching potential objects in said pair of comparison images bycomparing spatial characteristics of potential objects of a followingimage of said pair of comparison images with spatial characteristics ofpotential objects of a preceding image of said pair of comparison imagesto match potential objects in the following image to potential objectsin the preceding image; and(d) treating each matched potential object as a moving object to therebytrack the movement of each said moving object relative to said scene.

The invention also provides a method of reducing the effect ofvariations in intensity of illumination of a scene, the methodincluding:

converting intensity values of image elements of images acquired of saidscene to a logarithmic scale of intensity to thereby form alog-intensity image; and

filtering said log-intensity image with a high-pass spatial filter inorder to create an output image of said scene which is independent ofthe illumination of the scene.

The invention also provides a method of tracking an object added to orremoved from a scene, the method including:

(a) generating a series of background images of the scene by acquiringfor each background image a plurality of images of said scene, dividingeach said image into a plurality of background image elements,determining a value of a characteristic of each background image elementfor each image to obtain a set of values and determining from said setof values a background value for each of said background image elements;

(i) comparing each new background value with an old background value ofa corresponding background image element of an old background imagewhich precedes said new background image by a number of backgroundimages to determine whether said new value is sufficiently similar tosaid old value to thereby determine whether or not each new imageelement is sufficiently similar to the corresponding old image element;

(ii) determining from which new image elements are not sufficientlysimilar to the corresponding old image element which new image elementsrelate to potential objects and determining a spatial characteristic ofeach said potential object;

(b) matching potential objects in consecutive background images bycomparing spatial characteristics of potential objects of a followingbackground image of said consecutive images with spatial characteristicsof potential objects of a preceding background image of said consecutivebackground images to match potential objects in the following image topotential objects in the preceding image; and(c) treating each matched potential object as an object added to orremoved from the scene.

The invention also provides apparatus for tracking an object movingrelative to a scene, the apparatus including:

(a) image acquisition means;

(b) background image generation means for generating a background imagefrom a plurality of images of the scene acquired by said imageacquisition means, said background image generation means dividing eachimage into a plurality of background image elements, determining a valueof a characteristic of each said background image element for each imageto obtain a set of values, and determining from said set of values abackground value for each of said background image elements;(c) image comparison means for comparing a pair of comparison imageswith said background image by:

(i) dividing each image of a pair of comparison images acquired by saidimage acquisition means into a plurality of comparison image elementscorresponding to respective ones of the background image elements, anddetermining a comparison value of said characteristic for each of saidcomparison image elements;

(ii) comparing each said comparison value with said background value ofsaid corresponding background image element to determine whether saidcomparison value is sufficiently similar to said background value tothereby determine whether or not each said comparison image element issufficiently similar to the corresponding background image element; and

(iii) determining from which comparison image elements are notsufficiently similar to the corresponding background image element whichcomparison image elements relate to potential objects and determining aspatial characteristic of each said potential object;

(d) object matching means for matching potential objects in said pair ofcomparison images by comparing spatial characteristics of potentialobjects of a following image of said pair of comparison images withspatial characteristics of potential objects of a preceding image ofsaid pair of comparison images to match potential objects in thefollowing image to potential objects in the preceding image; and(e) object tracking means which treats each matched potential object asa moving object to thereby track the movement of each said moving objectrelative to said scene.

The invention also provides apparatus for reducing the effect ofvariations in intensity of illumination of a scene, the apparatusincluding:

logarithmic conversion means for converting intensity values of imageelements of images acquired of said scene to a logarithmic scale ofintensity to thereby form a log-intensity image; and

a high-pass spatial filter for filtering said log-intensity image inorder to create an output image of said scene which is independent ofthe illumination of the scene.

The invention also provides apparatus for tracking an object added to orremoved from a scene, the apparatus including:

(a) image acquisition means;

(b) background image generation means for generating a series ofbackground images of the scene, said background image generation meansgenerating each background image from a plurality of images of saidscene acquired by said image comparison means, said background imagegeneration means dividing each said image into a plurality of backgroundimage elements, determining a value of a characteristic of eachbackground image element for each image to obtain a set of values anddetermining from said set of values a background value for each of saidbackground image elements;

(i) background comparison means for comparing each new background valuewith an old background value of a corresponding background image elementof an old background image which precedes said new background image by anumber of background images to determine whether said new value issufficiently similar to said old value to thereby determine whether ornot each new image element is sufficiently similar to the correspondingold image element;

(ii) determining from which new image elements are not sufficientlysimilar to the corresponding old image element which new image elementsrelate to potential objects and determining a spatial characteristic ofeach said potential object;

(c) object matching means for matching potential objects in consecutivebackground images by comparing spatial characteristics of potentialobjects of a following background image of said consecutive images withspatial characteristics of potential objects of a preceding backgroundimage of said consecutive background images to match potential objectsin the following image to potential objects in the preceding image; and(d) object tracking means which treats each matched potential object asan object added to or removed from the scene.

BRIEF DESCRIPTION OF THE DRAWINGS

In order that the present invention may be understood, examples of anembodiment of two aspects of the invention will be described withreference to the accompanying drawings. In the drawings:

FIG. 1 shows a background image of a street scene;

FIG. 2 shows a new image which contains a human figure;

FIG. 3 shows the Boolean pixel array that is formed when the new imageis tested against the range images;

FIG. 4 shows a Boolean tile array formed from the Boolean pixel array;

FIG. 5 shows a smoothed Boolean tile array formed by smoothing theBoolean tile array;

FIG. 6 shows an image of a scene in overcast conditions;

FIG. 7 shows an image of same scene in sunlit conditions and with aperson in the foreground;

FIG. 8 shows an image which is obtained when the image of FIG. 6 isconverted using a logarithmic intensity scale;

FIG. 9 shows an image obtained when the image of FIG. 7 is similarlyconverted;

FIG. 10 shows an image obtained when the image of FIG. 8 is convolutedwith a high-pass integrating filter; and

FIG. 11 shows an image obtained when the image of FIG. 9 is similarlytreated.

DESCRIPTION OF THE PREFERRED EMBODIMENTS

The preferred embodiment of the first aspect of the present invention isparticularly suited to cases where the moving objects account for only asmall proportion of the intensity variance of the image. The processoperates upon a regularly timed sequence of images of one or moreobjects moving against a static background. The image data generated byan image acquisition means in the form of a camera or video recorderpassed by means of a video capture card and its associated software to acomputer programmed to carry out the object tracking method. Software tocarry out the method can be generated by any skilled programmer andtherefore description of suitable software is not included herein.

Where the objects being tracked are humans, 5 to 6 frames per second issufficient to enable an object to be tracked. The frame rate can bevaried in accordance with the anticipated maximum velocity of theobjects being tracked.

A background image such as that shown in FIG. 1 is prepared by abackground generation means from a sequence of background images of thescene. Each background image is divided into a plurality of backgroundimage elements. The background image elements are chosen to be eachpixel of the image but could be a group of pixels. A background value ofa characteristic of the background image is determined for each pixel.In the preferred embodiment, the characteristic of the background imageis chosen to be the intensity of the image. Thus there is a time seriesof intensity values for each pixel. Each value in the time seriescorresponds to one image in the sequence. This time series is binned toform a sample distribution of intensities for the given pixel and themode or most common value of intensity is found for each pixel. Thisvalue is assigned to the corresponding pixel in the background imagewhich thus constitutes background data against which other images can becompared. The mean or median can be used but are not as efficient.Objects that are moving in the scene will affect the values in theoutlying bins of the sample distribution of intensities as they pass bythe pixel but they will not, in general, affect the mode. Hence mostmoving objects will be invisible and only fixed and slowly movingobjects will affect the background data represented by the backgroundimage. That is, the background image is effectively an array ofbackground values which can be used as a basis of comparison.

Two “confidence limit images” or “range images” are derived by thebackground image generation means from the background image, a “highrange image” and a “low range image”. For each pixel in the backgroundimage, an intensity value is added to form the high range image and anintensity value is subtracted to form the low range image. The value tobe added and subtracted may be a constant derived from a prior knowledgeof the camera noise statistics or it may be estimated for each pixelfrom the sample distribution of intensity values for that pixel by theusual methods of estimating confidence limits.

More specifically, the modal and upper and lower confidence limit imagesare calculated as follows. A modal image is constructed from a sequenceof (say) 50 images as follows—For each pixel, an array of (say) 64“bins” is set up and the bins are assigned zero value. The intensityvalue (which must lie between 0 and 255) for that pixel for each imageis used to calculate a bin number, j, by dividing the intensity, I, by 4and bin(j) is incremented by one. After 50 images have been processed inthis way there is a numerical distribution of intensity for each pixel.The mode (most common value) of intensity for each pixel is found toform a background or “modal” image. The mode is chosen rather than themean because outliers caused by objects moving past the pixel corruptthe mean. This gives rise to a “double exposure” effect in thebackground image. On the other hand outliers do not affect the modeproviding there are not too many of them and passing objects arerendered invisible. The median would probably do just as well but themode is easier to calculate.

When the mode is being computed the upper and lower confidence limitsfor each pixel are calculated at the same time from the same numericaldistribution of intensity values by finding the root mean square of thedifference between the binned intensities and the mode. This root meansquare value is very close to but not quite the same thing as thestandard deviation, the latter being the root mean square differencefrom the mean. Adding and subtracting twice the root mean square to themode give the upper and lower confidence limits respectively. Thesevalues will be similar to the 95% confidence limits found by assuming aGaussian distribution and using twice the sample standard deviation orthose found using Student's t-test. Because of the presence of outliers,which equate to objects which the technique of the present inventiondetects, a Gaussian assumption is not justified.

Once the two background range images have been formed each newcomparison image acquired by the image acquisition means, such as thatshown in FIG. 2, is compared by an image comparison means with thebackground image to find potential new or moving objects in the scene.This is done pixel by a pixel—i.e. each pixel in the comparison imagehas its intensity determined by the image comparison means and comparedwith the intensity of the corresponding pixel in the background image todetermine whether the comparison value of intensity is sufficientlysimilar to the background value of intensity. If the intensity of apixel in the incoming frame lies between the intensities of the samepixel in the low range and high range images then the pixel isdesignated as a “False” pixel. If it lies outside the range then thepixel is designated as a “True” pixel, that is, one which issignificantly different from the background image. This processgenerates a two-dimensional Boolean pixel array of “True” and “False”pixels. The Boolean pixel array generated by the image in FIG. 2 isshown in FIG. 3. The Boolean pixel array contains a large area of Truecells 1 which are caused by the presence of the human figure in FIG.2—i.e. by the presence of a moving object. It also contains true cells 2which are due to noise in the new image. Initially each pixel that fallsoutside the range is presumed to relate to an object. Further processingof the pixels by a noise reduction means and the object matching meansprogressively eliminates pixels falling outside the range that do notrelate to objects.

The image plane is then uniformly subdivided by noise reduction meansinto groups of pixels called tiles, where each tile contains an array ofadjacent pixels. The Boolean pixel array is then used to generate aBoolean tile array by designating those tiles with more than, say, onehalf of their pixels true as “True” tiles and the rest as “False”. Thistiling process is not essential but it greatly increases the speed ofthe ensuing steps and also reduces noise by reducing the number of imageelements which are presumed to relate to objects and hence reducing thenumber of image elements treated as objects. The Boolean tile arrayderived from the Boolean pixel array of FIG. 3 is shown in FIG. 4.

The Boolean tile array may be smoothed by the noise reduction means toform a smoothed Boolean tile array. A new tile array is prepared whichhas the same dimensions as the Boolean tile array. Each cell in theBoolean tile array is tested to see how many of the surrounding tilesare true. If the number exceeds a preset threshold the correspondingcell in the smoothed Boolean tile array is set to True and to Falseotherwise. This is similar to passing a 2 dimensional moving averagefilter over numerical data. It is not essential but helps to eliminateprotrusions such as shadows. The smoothed Boolean tile array derivedfrom FIG. 4 is shown in FIG. 5.

The tiles in the smoothed Boolean tile array are then formed into groupsof proximate tiles by the noise reduction means. A true tile that lieswithin a predefined distance of a given true tile is allocated to thesame “space group”. Each space group is treated as an object by anobject matching means. True tiles do not necessarily have to touch tobelong to the same group (hence they are “proximate” rather than“contiguous”). The proximity thresholds, the predefined distances in thex and y directions, which define proximity, may be functions of positionin the image and of the scene geometry. They may be made larger near thebottom of the image where objects in the scene are closer to the camera.The proximity thresholds may also be varied according to the type ofobject being viewed and may be set differently if vehicles are beingviewed rather than people.

Once the space groups have been formed their statistical properties arecomputed by the object matching means in order to obtain a spatialcharacteristic for each object. These properties include the centroid ofthe group and its spatial standard deviations in the x and y directionsin units of inter-pixel distance.

Tens of thousands of pixel values in the incoming frame are reduced by 3or 4 orders of magnitude to a much smaller number of space groupstatistics; bitmap data has been converted to vector data.

Each space group is assumed to correspond to one or more real objectsmoving in the scene. Since solid objects are constrained by the laws ofphysics to move smoothly from frame to frame, proximity in time is asimportant as proximity in space when tracking real objects. For thisreason space groups which are by their nature associated only with asingle frame are incorporated into “time groups” which persist over manyframes. The space groups of each new frame are compared with time groupsderived from preceding frames to see if any matches can be found. A timegroup may consist simply of a single space group from a single frame orit may persist for many consecutive frames. Time groups which do notpersist for a number of frames are discarded and not treated as objects.In the preferred embodiment time groups are discarded if they don'tpersist for more than three frames.

A space group is allocated to a time group if the centroid of the spacegroup is sufficiently close to the centroid of the time group and if thespatial standard deviations of the space group are sufficiently close tothe spatial standard deviations of the time group. Space groups whichare matched to time groups are used by the object tracking means totrack moving objects.

If a single match is found by the object matching means the space groupis added to that time group. If more than one space group matches a timegroup the one with the best match is chosen. If no space groups match anexisting time group, the time group is killed off. If no time groupsmatch a space group a new time group is started. When a space group isadded to a time group the statistics of the space group are used toupdate the statistics of the time group. The centroid of the space groupis added to a list of coordinate pairs associated with the time groupand its spatial variance data are incorporated into the spatial variancedata of the time group. The distance matching thresholds are notabsolute but are based on the spatial size of the groups as summarizedby their spatial standard deviations. The size matching criterion isbased on a “matching factor” which is a function of the ratio of thespatial standard deviations of the space group to those of the candidatetime group. More specifically, the centroid and spatial standarddeviation of each space group is calculated first. The centroid andspatial standard deviations of each time group are calculated from thesums and sums of squares of its member space groups.

The spatial standard deviations are the square roots of the spatialvariances in the x and y directions.

When a candidate space group is tested for a match with a given timegroup it has to pass all four of the following tests before it isconsidered as a possible match—

-   -   1. Is the x-coordinate of the centroid of the space group (i.e.        the mean x-value of the group) closer than the        x-distance-tolerance to the x-coordinate of the centroid of the        time group?    -   2. Is the y-coordinate of the centroid of the space group closer        than the y-distance-tolerance to the y-coordinate of the        centroid of the time group?    -   3. Is the x-size-factor less than the size factor threshold?    -   4. Is the y-size-factor less than the size factor threshold?        These tests can be expressed as follows:        (x _(s) −x _(t))² <G max(s _(x) , t _(x))  (1)        (y _(s) −y _(t))² <G max(s _(y) , t _(y))  (2)        |t _(x) −s _(x)|/(t _(x) +s _(x))<F  (3)        |t _(y) −s _(y)|/(t _(y) +s _(y))<F  (4)

where x_(s) is the x-coordinate of the centroid of the space group,x_(t) is the x-coordinate of the centroid of the time group, s_(x) isthe x-variance of the space group, t_(x) is the x-variance of the timegroup, G is a constant between 2 and 10 and F is a constant between 0.5and 1.5. The y-direction quantities are similarly defined.

Optimum values of G and F have been found by the inventors to be 6.0 and1.0 respectively.

Time groups can be visualized as groups of true tiles in a 3 dimensionalspace in which the dimensions are the pixel coordinates and the framenumber. When a time group is terminated, that is when no space group canbe found which matches it or when it has persisted for a preset numberof frames (usually ten frames), the time group is converted to a linesegment. This involves the object tracking means fitting a straight lineto the list of space group centroid coordinates associated with the timegroup.

Each line segment is a straight line in (x, y, t) space. The slope ofthis line provides an estimate of the velocity of an object moving inthe scene to which the line segment is assumed to correspond.

Just as space groups were built up into time groups, line segments arebuilt up into lines or “trajectories” by the object tracking means. Whena line segment is formed it is projected backwards in time to see if itlies close to the most recent segment of an existing trajectory. If morethan one trajectory is found the closest is chosen. If no suitabletrajectories are found then the line segment is made the first side of anew trajectory. When a new line segment is projected backward it is alsochecked to see if it intersects the boundaries of the scene. If so an“entrance” has occurred and a new trajectory is started. Likewise themost recent side of each existing trajectory is projected forward intime to se if it intersects a boundary. If it does an “exit” hasoccurred and the trajectory is terminated and excluded from furtherconsideration. When matching a line segment to a trajectory, a test isalso carried out to see if the concatenation involves an unrealisticallyhigh acceleration. If so the concatenation is abandoned.

It will thus be understood that a trajectory represents the movement ofan object and also consists of a series of line segments. Each linesegment corresponds to a time group. The length of the time group willvary depending on the number of frames for which the time grouppersisted. In the preferred embodiment, time groups which persist forthree or less frames are discarded on the assumption that if they do notpersist for a sufficient time they cannot relate to a moving object. Thelength of each time group is capped at ten frames. The number of frameschosen as the maximum length for each time group is chosen in dependenceof two factors:

-   -   (1) the number of centroids which are used to produce a line        segment and hence the extent to which the line segment correctly        approximates the position of the centroids; and    -   (2) the smoothness of the resulting trajectory.

It has been found that ten frames provides a good balance between thesefactors. However, the choice of frames will be dependent on theapplication. For example, if many images are taken of slowly movingobjects, the length of the resulting line segments would have lesseffect on the curve of the trajectory and hence more frames could beused to make up each time group.

A trajectory tracks the motion of an individual object in the scene. Atany given time, t, the spatial coordinates, (x, y), of the centroid ofthe object can be calculated or predicted by extrapolating the precedingline segment. This position information can be used to mark the objectbeing tracked for display purposes or to keep a record of the movementof a particular object in the scene. It can also be transmitted from thecomputer in order to control another device such as an automaticpan-tilt head fitted with a second, high-resolution camera.

When objects pass one another so that their space groups temporarilycoalesce then separate again, the extrapolation of line segments allowsboth the objects to be tracked and their separate identities maintained.

Each trajectory summarises the size, shape and kinetic behavior of asingle object in the scene. The characteristics of a trajectory and/orits component line segments allow objects and behaviors to bedistinguished and selectively reported. Thus trajectories with linesegments having velocities above a preset threshold may indicate thepresence of persons running, fighting or skateboarding. A trajectoryenduring for a longer than a preset threshold time would indicate aperson loitering in the scene. The size of the line segments as recordedin their spatial standard deviations is also a distinguishingcharacteristic and allows adults to be distinguished from children andanimals, vehicles to be distinguished from people and vehicle types tobe distinguished from one another.

It should be understood that the present invention is not confined tothe above-described embodiment.

The term “intensity” which is used in the above description may begeneralised to include any other quantifiable characteristic of an imagepixel such as hue or saturation.

In a real environment the background image needs to be continuallyrefreshed to allow for slow changes in the scene. A new background imageis created every 100 frames or so, from which moving objects have beenlargely removed. This provides an opportunity to detect static objectsthat have recently appeared in the scene or disappeared from the scene.Each new background image (rather than each new comparison image) iscompared by a background image comparison means with a pair of rangeimages that have been saved some time previously. Typically the thirdlast background image is used. In this case the process need only betaken as far as computing the line segments. That is, each backgroundimage is compared first with a background image which precedes it intime by three background images to determine changes in the scene in theintervening period. That is, the mode of each new background image iscompared with the confidence limits of the old background image todetermine whether there have been any changes. Each change is treated asa slow moving object. The potential slow moving objects are then matchedwith potential slow moving objects of preceding background images todetermine whether a change has persisted over consecutive frames. Anyobject which persists sufficiently long enough (typically for threebackground images) is recognised as being a change to the scene. Achanged static object in the scene will give rise a new line segmentthat persists in consecutive background images. The size of the staticobject to which it corresponds may be estimated as before, allowingparcels to be distinguished from vehicles etc.

Another problem that besets this method of image comparison is theoccurrence of pixels that are noisy due to environmental reasons. Forexample shiny objects reflecting moving clouds, wind blown vegetation,the shadows of wind blown vegetation and the shadows of tall buildingsmoving with the sun's motion in the sky. The algorithms described abovefor constructing space groups and time groups are sufficiently robust toallow pixels to be masked with little deterioration in performance.Pixels from problem areas can be made self-masking by computing theproportion of time for which each pixel is True. On a sufficiently longtime scale any given pixel should only be true for a small proportion ofthe time. Any that are not can be designated as “noisy” pixels andexcluded from consideration when the Boolean pixel array is formed by amasking means.

The second aspect of the invention relates to a high-pass spatialfiltering process suitable for use in combination with the objecttracking method of the first aspect of the invention. In the embodimentof the second aspect of the invention a background image shown in FIG. 6and a current image shown in FIG. 7 were firstly converted to a newscale to give the images shown in FIGS. 8 and 9 respectives. Each imageis composed of a plurality of pixels and the shading of the pixelsrepresents their intensity value. The function used to convert theintensity value of each pixel, I, in each image to log-intensity, L, wasdefined as:

L=m log(I)+c for 4≦I<256

and L=I for 0≦I<4

where m=42/log(2) and c=−80.

The function was made linear for very small intensity values to avoidthe numerical complications of large negative values of L as Iapproached zero. The parameters m and c were chosen so that intensityvalues between 4 and 255 mapped into the same range (4 to 255) forcomputational convenience. The conversion is made computationallyefficient by computing 256 table entries only once at the outset andthen finding the appropriate value of L for each pixel by table look-up.

Next the rescaled images shown in FIGS. 8 and 9 were filtered byconvoluting them, firstly with filter coefficients, a_(i), in Table 1 inthe horizontal direction, and then with the filter coefficients, b_(j),in Table 2 in the vertical direction.

TABLE 1 a₀ a₁ a₂ a₃ a₄ a₅ a₆ a₇ A₈ −1 −8 −12 8 26 8 −12 −8 −1

TABLE 2 b₀ b₁ b₂ b₃ b₄ b₅ b₆ 1 6 15 20 15 6 1

Finally the logarithmically scaled, filtered images were decimated by 2both horizontally and vertically to give the images shown in FIGS. 10and 11. Because the filtering process gave rise to negative values ofintensity, the intensities were adjusted to lie between 0 and 255 fordisplay purposes. In these figures the value zero is represented bymiddle grey, negative values are darker than this and positive valuesare lighter. Comparison of FIGS. 10 and 11 shows that only realdifferences in the scene—i.e. due to the figure in the foreground, areapparent and that the effects of the different illumination conditionshave been removed.

It should be evident from the above that this aspect of the inventionstems from the realization that changing the illumination of a scenechanges the intensities of differently coloured objects in the image ina constant ratio. On the other hand most image processing operations areadditive and subtractive rather than multiplicative. By convertingintensity, I, to the scaled logarithm of intensity, L, this difficultyis overcome. If the scene illumination increases, the values of I foreach pixel should all increase by a constant ratio and it follows thatthe values L for each pixel should all increase by a constant increment.

In practice the illumination of different parts of a scene may notchange by the same amount over the entire scene. Nevertheless objectsthat are close together in the scene will, in general, experiencesimilar illumination changes. It follows that pixels that are closetogether in the image should take values of L that vary by the sameabsolute amount and that in general the differences in L betweenneighbouring pixels will be constant and independent of the illuminationof the scene.

It follows that the logarithmic resealing of an image as described abovefollowed by convolution with any differencing filter will give rise to anew image which is independent of the illumination of the scene.Unfortunately the simple differencing of neighbouring pixels willgreatly decrease the signal to noise ratio of the image. L is alwayspositive so that the difference, •L, is usually much less than L itself.On the other hand the variance of the camera noise is the sum of thevariances of each individual pixel. In general this relative increase innoise will be intolerable.

This problem can be overcome by combining the differencing operationwith an integrating operation whereby the L values of a number ofneighbouring pixels are added together in order to increase the signalto noise ratio of the resulting sum. This is the major function of theconvoluting filters whose coefficients are listed in Tables 1 and 2. Thefilter of Table 2 is purely an integrating filter. The filter of Table 1is similar but the coefficients add to zero. Thus the z-transform has azero at 1+0i and the Fourier transform is zero at zero frequency. Thismeans that the filter is a broadband, high-pass filter that detectsbroad edges in the horizontal direction. Experiments showed that therewas no need for high pass filtering in the vertical direction as well asin the horizontal direction. Such additional filters tended toexaggerate the presence of rooftops and fence lines in an undesirableway.

The use of the filters had the effect of blurring the image. Thisblurring allows the image to be decimated without this leading to thealiasing effects that would occur in the absence of filtering. Theeffects of camera vibration are also greatly reduced. Decimation in thisway leads to improvements in the speed of downstream processingalgorithms since there are fewer pixels to be processed in each frame.

It should be appreciated that this aspect of the invention is notlimited to the above-described embodiment. In particular the filtercoefficients were chosen specifically for distinguishing human scaleobjects positioned meters or a few tens of meters from the camera. Manyother high-pass integrating filters are possible and may work better indifferent applications. Integration and decimation may not be desirablewhere high resolution is required. In well-lit environments in whichintensity changes are uniform across the scene, the averagelog-intensity of the whole image can be subtracted from thelog-intensity for each pixel and any other high-pass filtering can bedispensed with.

The filter desirably meets the following conditions:

-   -   1. Its transfer function should be zero at zero frequency i.e.        at z=1+0i in the z-plane,    -   2. It should minimize the variance of intensity over the time        coordinate relative to the variance in intensity over the        spatial coordinates in order to minimize the effect of camera        noise relative to the “signal”, and    -   3. It should be well behaved and not give rise to aliasing and        Gibbs effects.

Condition 1 implies that the coefficients must add to zero. Condition 2is best achieved by having a single negative coefficient with a largeabsolute value balancing the remaining coefficients which all take thesame small positive value. Condition 3 is best achieved by convoluting aone dimensional filter which satisfies the first two conditions withsimple one dimensional Gaussian or Pascal's triangle coefficients. Theconvolution is carried out first in one direction and then in the otherto give a two dimensional spatial filter with the desiredcharacteristics. The order of the convolutions in the x and y directionsis irrelevant.

The first and second aspects of the invention may be combined, byemploying the high-pass spatial filter of the second aspect of theinvention each time an image, whether that be a background image or acomparison image, is acquired. These images can thus be used as thebasis of the comparison if the determination of which pixels are trueand false pixels and hence to determine true and false tiles and carryout the remaining steps of the first aspect of the invention.

Herein, the embodiments of aspects of the methods of invention have beendescribed as being carried out by a computer programmed with softwarewritten to carry out the various steps of the method. However, it willbe understood that equivalent hardware could be employed to carry outthe invention. Further, it will be understood that steps such asdetermining a characteristic value for each image element may beperformed by a separate means which embodies a sub-routine and thissub-routine may be shared by other means to carry out this part of themethod.

1. A method of tracking an object moving relative to a scene, the methodincluding: (a) generating a background image by acquiring a plurality ofimages of the scene, dividing each image into a plurality of backgroundimage elements, determining a value of a characteristic of each saidbackground image element for each image to obtain a set of values, anddetermining from said set of values a background value for each of saidbackground image elements, wherein determining said background valueinvolves determining the mode of said set of values; (b) comparing apair of comparison images with said background image by: (i) acquiring apair of comparison images, dividing each said comparison image into aplurality of comparison image elements corresponding to respective onesof the background image elements, and determining a comparison value ofsaid characteristic for each of said comparison image elements; (ii)comparing each said comparison value with said background value of saidcorresponding background image element to determine whether saidcomparison value is sufficiently similar to said background value tothereby determine whether or not each said comparison image element issufficiently similar to the corresponding background image element; and(iii) determining from which comparison image elements are notsufficiently similar to the corresponding background image element whichcomparison image elements relate to potential objects and determining aspatial characteristic of each said potential object; (c) matchingpotential objects in said pair of comparison images by comparing spatialcharacteristics of potential objects of a following image of said pairof comparison images with spatial characteristics of potential objectsof a preceding image of said pair of comparison images to matchpotential objects in the following image to potential objects in thepreceding image; and (d) treating each matched potential object as amoving object to thereby track the movement of each said moving objectrelative to said scene.
 2. A method of tracking an object as claimed inclaim 1, wherein determining said background value of saidcharacteristic involves determining a range of background values, andsaid comparison value is determined to be sufficiently similar to saidbackground value if said comparison value is within said range.
 3. Amethod of tracking an object as claimed in claim 2, wherein said rangeof background values is determined by calculating upper and lowerconfidence limits about the mode of said set of values.
 4. A method oftracking an object as claimed in claim 1, wherein said pair ofcomparison images are consecutive images in a series of comparisonimages and each comparison image is compared with said background imageto determine which comparison image elements relate to potential objectsand wherein potential objects in a following image are matched topotential objects in a plurality of preceding images before beingtreated as a moving object.
 5. A method of tracking an object as claimedin claim 4, wherein the spatial characteristics of potential objects ofa plurality of preceding images are combined to produce a combinedspatial characteristic for each potential object and the spatialcharacteristic of the following object is compared to the combinedspatial characteristic.
 6. A method as claimed in claim 4, wherein thenumber of the plurality of preceding images is about ten.
 7. A method asclaimed in claim 1, wherein determining which comparison image elementsrelate to objects includes processing each comparison image elementwhich is not sufficiently similar to the corresponding background imageelement to reduce the effect that noise has on image elements beingtreated as objects.
 8. A method as claimed in claim 7, wherein saidimage elements are pixels and said pixels are grouped into tiles made upof an array of pixels so that said image is divided into a plurality oftiles, and wherein said processing involving determining whether or notthe majority of comparison image elements in said tile are sufficientlysimilar and treating each tile which has a majority of image elementswhich are not sufficiently similar as an object.
 9. A method as claimedin claim 8, including determining whether each said tile is sufficientlyrelated to others of said tiles to be an object, grouping saidsufficiently related tiles, and treating said sufficiently related groupof tiles as said object.
 10. A method as claimed in claim 1, whereinsaid spatial characteristic includes the centroid of the object.
 11. Amethod as claimed in claim 10, wherein said combined spatialcharacteristic includes spatial standard deviations from the centroidcalculated for said plurality of preceding images.
 12. A method asclaimed in claim 1, wherein said spatial characteristic includes ameasure of the size of the object.
 13. A method as claimed in claim 1,wherein objects which are not matched to objects in the preceding imageare treated as potential new objects.
 14. A method as claimed in claim1, wherein objects which have been matched to objects in the precedingframe for a predetermined number of frames are converted to a linesegment, by fitting a straight line between the centroids of eachobject.
 15. A method as claimed in claim 14, wherein said line segmentsare joined to form trajectories showing the movement of said objectacross said scene, by determining whether a following line segment issufficiently related to a preceding line element.
 16. A method asclaimed in claim 1, further including masking undesirable image elementsby determining whether the comparison value of an image element is notsufficiently similar to the background value too frequently to relate toa moving object and excluding said unreliable image elements fromfurther processing.
 17. A method of tracking an object added to orremoved from a scene, the method including: (a) generating a series ofbackground images of the scene by acquiring for each background image aplurality of images of said scene, dividing each said image into aplurality of background image elements, determining a value of acharacteristic of each background image element for each image to obtaina set of values and determining from said set of values a backgroundvalue for each of said background image elements, wherein determining avalue of a characteristic for each image element involves determiningthe mode of a set of values obtained for each image element from saidplurality of images; (i) comparing each new background value with an oldbackground value of a corresponding background image element of an oldbackground image which precedes said new background image by a number ofbackground images to determine whether said new value is sufficientlysimilar to said old value to thereby determine whether or not each newimage element is sufficiently similar to the corresponding old imageelement; (ii) determining from which new image elements are notsufficiently similar to the corresponding old image element which newimage elements relate to potential objects and determining a spatialcharacteristic of each said potential object; (b) matching potentialobjects in consecutive background images by comparing spatialcharacteristics of potential objects of a following background image ofsaid consecutive images with spatial characteristics of potentialobjects of a preceding background image of said consecutive backgroundimages to match potential objects in the following image to potentialobjects in the preceding image; and (c) treating each matched potentialobject as an object added to or removed from the scene.
 18. A method oftracking an object as claimed in claim 17, wherein confidence limits areassigned to each value and the value of a new background image elementis sufficiently similar to the value of a corresponding old backgroundimage element if the mode of a new image element falls within theconfidence limits of the old image element.
 19. A method as claimed inclaim 17, wherein old background images precede new background images bythree background images and matched potential objects are treated as newobjects if they persist for three consecutive background images. 20.Apparatus for tracking an object moving relative to a scene, theapparatus including: (a) image acquisition means; (b) background imagegeneration means for generating a background image from a plurality ofimages of the scene acquired by said image acquisition means, saidbackground image generation means dividing each image into a pluralityof background image elements, determining a value of a characteristic ofeach said background image element for each image to obtain a set ofvalues, and determining from said set of values a background value foreach of said background image elements, wherein determining saidbackground value involves determining the mode of said set of values;(c) image comparison means for comparing a pair of comparison imageswith said background image by: (i) dividing each image of a pair ofcomparison images acquired by said image acquisition means into aplurality of comparison image elements corresponding to respective onesof the background image elements, and determining a comparison value ofsaid characteristic for each of said comparison image elements; (ii)comparing each said comparison value with said background value of saidcorresponding background image element to determine whether saidcomparison value is sufficiently similar to said background value tothereby determine whether or not each said comparison image element issufficiently similar to the corresponding background image element; and(iii) determining from which comparison image elements are notsufficiently similar to the corresponding background image element whichcomparison image elements relate to potential objects and determining aspatial characteristic of each said potential object; (d) objectmatching means for matching potential objects in said pair of comparisonimages by comparing spatial characteristics of potential objects of afollowing image of said pair of comparison images with spatialcharacteristics of potential objects of a preceding image of said pairof comparison images to match potential objects in the following imageto potential objects in the preceding image; and (e) object trackingmeans which treats each matched potential object as a moving object tothereby track the movement of each said moving object relative to saidscene.
 21. Apparatus for tracking an object as claimed in claim 20,wherein said background image generation mean determination of saidbackground value of said characteristic involves determining a range ofbackground values, and said image comparison means determines that saidcomparison value is sufficiently similar to said background value ifsaid comparison value is within said range.
 22. Apparatus for trackingan object as claimed in claim 21, wherein said background imagegeneration means determines said range of background values bycalculating upper and lower confidence limits about the mode of said setof values.
 23. Apparatus for tracking an object as claimed in claim 20,wherein said pair of comparison images are consecutive images in aseries of comparison images acquired by said image comparison means andeach comparison image is compared by said image comparison means withsaid background image to determine which comparison image elementsrelate to potential objects and wherein said object matching meansmatches potential objects in a following image to potential objects in aplurality of preceding images and said object tracking means treats eachobject matched to a plurality of preceding potential objects as a movingobject.
 24. Apparatus for tracking an object as claimed in claim 23,wherein said object matching means combines spatial characteristics ofpotential objects of a plurality of preceding images to produce acombined spatial characteristic for each potential object and matchesthe spatial characteristic of the following object to the combinedspatial characteristic.
 25. Apparatus as claimed in claim 23, whereinthe number of the plurality of preceding images is about ten. 26.Apparatus as claimed in claim 20, wherein said image comparison meansincludes noise reduction means for processing each comparison imageelement which is not sufficiently similar to the correspondingbackground image element to reduce the effect that noise has on imageelements being treated as objects.
 27. Apparatus as claimed in claim 26,wherein said image elements are pixels and said noise reduction meansgroups pixels into tiles made up of an array of pixels so that saidimage is divided into a plurality of tiles, and wherein said noisereduction means determines whether or not the majority of comparisonimage elements in said tile are sufficiently similar and said imagecomparison means treats each tile which has a majority of image elementswhich are not sufficiently similar as a potential object.
 28. Apparatusas claimed in claim 27, wherein said noise reduction means determineswhether each said tile is sufficiently related to others of said tilesto be an object, grouping said sufficiently related tiles, and saidimage comparison means treats said sufficiently related group of tilesas a potential object.
 29. Apparatus as claimed in claim 20, whereinsaid spatial characteristic includes the centroid of the object. 30.Apparatus as claimed in claim 29, wherein said combined spatialcharacteristic includes spatial standard deviations from the centroidcalculated for the plurality of preceding images.
 31. Apparatus asclaimed in claim 20, wherein said spatial characteristic includes ameasure of the size of the object.
 32. Apparatus as claimed in claim 20,wherein said object matching means treats objects which are not matchedto objects in the preceding image as potential new objects. 33.Apparatus as claimed in claim 20, wherein said object tracking meansconverts objects which have been matched to objects in the precedingframe for a predetermined number of frames to a line segment, by fittinga straight line between the centroids of each object.
 34. Apparatus asclaimed in claim 33, wherein object tracking means joins said linesegments to form trajectories showing the movement of said object acrosssaid scene, by determining whether a following line segment issufficiently related to a preceding line element.
 35. Apparatus asclaimed in claim 20, further including masking means for maskingundesirable image elements by determining whether the comparison valueof an image element is not sufficiently similar to the background valuetoo frequently to relate to a moving object and excluding saidunreliable image elements from further processing.
 36. Apparatus fortracking an object added to or removed from a scene, the apparatusincluding: (a) image acquisition means; (b) background image generationmeans for generating a series of background images of the scene, saidbackground image generation means generating each background image froma plurality of images of said, scene acquired by said image comparisonmeans, said background image generation means dividing each said imageinto a plurality of background image elements, determining a value of acharacteristic of each background image element for each image to obtaina set of values and determining from said set of values a backgroundvalue for each of said background image elements, wherein determining avalue of a characteristic for each image element involves determiningthe mode of a set of values obtained for each image element from saidplurality of images; (i) background comparison means for comparing eachnew background value with an old background value of a correspondingbackground image element of an old background image which precedes saidnew background image by a number of background images to determinewhether said new value is sufficiently similar to said old value tothereby determine whether or not each new image element is sufficientlysimilar to the corresponding old image element; (ii) determining fromwhich new image elements are not sufficiently similar to thecorresponding old image element which new image elements relate topotential objects and determining a spatial characteristic of each saidpotential object; (c) object matching means for matching potentialobjects in consecutive background images by comparing spatialcharacteristics of potential objects of a following background image ofsaid consecutive images with spatial characteristics of potentialobjects of a preceding background image of said consecutive backgroundimages to match potential objects in the following image to potentialobjects in the preceding image; and (d) object tracking means whichtreats each matched potential object as an object added to or removedfrom the scene.
 37. Apparatus for tracking an object as claimed in claim36, wherein said background image generation means assigns confidencelimits to each value and said background comparison means determinesthat the value of a new background image element is sufficiently similarto the value of a corresponding old background image element if the modeof a new image element falls within the confidence limits of the oldimage element.
 38. Apparatus as claimed in claim 36, wherein oldbackground images precede new background images by three backgroundimages and said object tracking means treats matched potential objectsas new objects if they persist for three consecutive background images.